Speech intonation for TTS: study on evaluation methodology
نویسندگان
چکیده
The standard evaluation of intonation models is by means of non-referenced subjective tests (pair or MOS) in which subjects rate the quality or compare different samples without any explicit reference. These tests are usually conducted on an isolated sentence basis. However, for a single sentence, with no contextual information, there are multiple valid intonations. A subject’s preference over this range of intonation patterns may be highly personal. This paper investigates the degree to which this ambiguity in the appropriate intonation pattern impacts the assessments of prosody for speech synthesis systems. To examine this problem, the variance of the F0 pattern of several vocoded sentences was modified and subjects asked to compare multiple versions with different levels of modification in terms of preference/quality. Then, they were presented with the reference which defines the original intonation and asked about the similarity to that reference. The results show that subjects can identify the samples with no F0 variance modification when given a reference but they don’t always prefer them. Thus, nonreferenced tests with no context, though may help to analyse user acceptability, may not be appropriate to measure the performance of intonation models.
منابع مشابه
Towards an intonation module for a portuguese TTS system
In this paper, a correlation between the linguistic structure of the written text and the real intonation behavior of the read speech in European Portuguese language (EP) is presented. It is our belief that intonation behavior in EP can be strongly predicted from two main coordinates: the syntactic structure of the sentence and its pragmatic communicative function, in one way, combined with the...
متن کاملComparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...
متن کاملMaximum-likelihood dynamic intonation model for concatenative text-to-speech system
In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...
متن کاملModeling of intonation bearing emphasis for TTS-synthesis of greek dialogues
TTS-synthesis of neutral style Greek with good intelligibility and quality has been achieved some time ago. As a further step towards expanding the applications domain of the TTS-system developed in our laboratory, the incorporation of emphasis into speech used in man-machine dialogues according to their context has been studied recently. In this paper the method applied for the analysis of int...
متن کاملA joint prosody evaluation of French text-to-speech synthesis systems
This paper reports on prosodic evaluation in the framework of the EVALDA/EvaSy project for text-to-speech (TTS) evaluation for the French language. Prosody is evaluated using a prosodic transplantation paradigm. Intonation contours generated by the synthesis systems are transplanted on a common segmental content. Both diphone based synthesis and natural speech are used. Five TTS systems are tes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014